64 research outputs found
KL-Divergence Guided Temperature Sampling
Temperature sampling is a conventional approach to diversify large language
model predictions. As temperature increases, the prediction becomes diverse but
also vulnerable to hallucinations -- generating tokens that are sensible but
not factual. One common approach to mitigate hallucinations is to provide
source/grounding documents and the model is trained to produce predictions that
bind to and are attributable to the provided source. It appears that there is a
trade-off between diversity and attribution. To mitigate any such trade-off, we
propose to relax the constraint of having a fixed temperature over decoding
steps, and a mechanism to guide the dynamic temperature according to its
relevance to the source through KL-divergence. Our experiments justifies the
trade-off, and shows that our sampling algorithm outperforms the conventional
top-k and top-p algorithms in conversational question-answering and
summarization tasks
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Recent work has shown that either (1) increasing the input length or (2)
increasing model size can improve the performance of Transformer-based neural
models. In this paper, we present a new model, called LongT5, with which we
explore the effects of scaling both the input length and model size at the same
time. Specifically, we integrated attention ideas from long-input transformers
(ETC), and adopted pre-training strategies from summarization pre-training
(PEGASUS) into the scalable T5 architecture. The result is a new attention
mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's
local/global attention mechanism, but without requiring additional side-inputs.
We are able to achieve state-of-the-art results on several summarization tasks
and outperform the original T5 models on question answering tasks.Comment: Accepted in NAACL 202
Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts
Soft prompts have been recently proposed as a tool for adapting large frozen
language models (LMs) to new tasks. In this work, we repurpose soft prompts to
the task of injecting world knowledge into LMs. We introduce a method to train
soft prompts via self-supervised learning on data from knowledge bases. The
resulting soft knowledge prompts (KPs) are task independent and work as an
external memory of the LMs. We perform qualitative and quantitative experiments
and demonstrate that: (1) KPs can effectively model the structure of the
training data; (2) KPs can be used to improve the performance of LMs in
different knowledge intensive tasks
- …